#LLM Safety

3 articles

TechApr 27, 20269 min

Second-order injection targeting safety monitor evaluators

An LLM safety monitor's evaluator can be tricked into clearing dangerous sessions when the attacker plants fake analysis text in the monitored conversation. Experimental results, defense limits, and structural separation points.

Security LLM Prompt Injection LLM Safety AI Agents

TechApr 4, 2026updated14 min

All Claude tiers jailbroken: AFL attack and the structural failure of constitutional safety

A security researcher bypassed Claude Opus 4.6's policy evaluation with just four short prompts, generating attack code against live infrastructure. Plus 915 files exfiltrated from the sandbox.

Security Claude Anthropic LLM Safety Jailbreak AI Agents

TechMar 12, 202615 min

Prompt injection countermeasures using GitHub's agent execution platform and OpenAI IH-Challenge

GitHub releases the layered defense design of the agent execution platform, and OpenAI releases the instruction hierarchy training data IH-Challenge and model. Responses to prompt injection were received from both infrastructure design and training axes.

A.I.Security GitHub OpenAI AI Agent LLM Safety